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^vi ■ First, I congratulate the authors for a truly stimulating paper. The paper 

resolves a number of important questions but, at the same time, raises many 
others. I would like to focus my comments to two specific points. 






1. The similarity of Stagewise and LARS fitting to the Lasso suggests 
that the estimates produced by Stagewise and LARS fitting may minimize 
an objective function that is similar to the appropriate Lasso objective func- 
tion. It is not at all (at least to me) obvious how this might work though. 
I note, though, that the construction of such an objective function may be 
easier than it seems. For example, in the case of bagging [Breiman (1996)] 
or subagging [Biihlmann and Yu (2002)], an "implied" objective function 
^sO ■ can be constructed. Suppose that 0±,...,9 m are estimates (e.g., computed 

^r | from subsamples or bootstrap samples) that minimize, respectively, objec- 

q ' tive functions Z\ , . . . , Z m and define 
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then 6 minimizes the objective function 

Z{t) = inf {Zi(ti) + • • • + Z m {t m ) : <?(*!, ...,t m )= t}. 



£> . (Thanks to Gib Bassett for pointing this out to me.) A similar construction 

for stagewise fitting (or LARS in general) could facilitate the analysis of the 
5_j ■ statistical properties of the estimators obtained via these algorithms. 

a i 

2. When I first started experimenting with the Lasso, I was impressed 
by its robustness to small changes in its tuning parameter relative to more 
classical stepwise subset selection methods such as Forward Selection and 
Backward Elimination. (This is well illustrated by Figure 5; at its best, 
Forward Selection is comparable to LARS, Stagewise and the Lasso but 
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2 DISCUSSION 

the performance of Forward Selection is highly dependent on the model 
size.) Upon reflection, I realized that there was a simple explanation for this 
robustness. Specifically, the strict convexity in j3 for each t in the Lasso 
objective function (1.5) together with the continuity (in the appropriate 
sense) in t of these objective functions implies that the Lasso solutions (3(t) 
are continuous in t; this continuity breaks down for nonconvex objective 
functions. Of course, the same can be said of other penalized least squares 
estimates whose penalty is convex. What seems to make the Lasso special 
is (i) its ability to produce exact estimates and (ii) the "fact" that its 
bias seems to be more controllable than it is for other methods (e.g., ridge 
regression, which naturally overshrinks large effects) in the sense that for a 
fixed tuning parameter the bias is bounded by a constant that depends on 
the design but not the true parameter values. At the same time, though, 
it is perhaps unfair to compare stepwise methods to the Lasso, LARS or 
Stagewise fitting since the space of models considered by the latter methods 
seems to be "nicer" than it is for the former and (perhaps more important) 
since the underlying motivation for using Forward Selection is typically not 
prediction. For example, bagged Forward Selection might perform as well as 
the other methods in many situations. 
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